In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
Visualize the German Traffic Signs Dataset. This is open ended, some suggestions include: plotting traffic signs images, plotting the count of each sign, etc. Be creative!
The pickled data is a dictionary with 4 key/value pairs:
# Load pickled data
import pickle
import numpy as np
import cv2
training_file = "../traffic-signs-data/GTSRB_size32/train.p"
testing_file = "../traffic-signs-data/GTSRB_size32/test.p"
with open(training_file, mode='rb') as f:
train = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
### To start off let's do a basic data summary.
# Number of training examples
n_train = len(train['features'])
# Number of testing examples
n_test = len(test['features'])
# What's the shape of an image?
image_shape = train['features'][0].shape
# How many classes are in the dataset
n_classes = np.max(train['labels']) + 1
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
### Data exploration visualization goes here.
### Feel free to use as many code cells as needed.
import matplotlib.pyplot as plt
# Show typical image for every class and the number of items in it.
nb_per_class = np.zeros((n_classes,), dtype=int)
for i in range(n_classes):
nb_per_class[i] = np.sum(train['labels']==i)
idxes = np.argwhere(train['labels']==i)
plt.imshow(train['features'][idxes[20][0]])
plt.title('Class %i: %i images.' % (i+1, nb_per_class[i]))
plt.show()
# Graph of the number of items per class
plt.plot(nb_per_class, 'bo')
plt.ylabel('Nb of training images')
plt.show()
# Some blurry(!) images
nb_plots = 4
delta = 17900
for i in range(nb_plots):
plt.subplot(1, nb_plots, i+1)
plt.imshow(train['features'][delta+i])
plt.show()
# print(train['features'][0].shape)
# print(train['labels'].dtype)
# print(train['features'][0])
Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.
There are various aspects to consider when thinking about this problem:
Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
For this second problem, I decided to use the recent library TensorFlow-Slim built on top of TensorFlow. Similar to Keras, it provides a simpler interface to quickly define, train and test Neural Networks (and especially CNNs).
Besides having previous experience with the former, I decided over the latter for several reasons:
The single drawback I can see is that the community around Keras is larger, providing more contrib packages/layers and so on. But TensorFlow anyway also has a very strong community!
For my general pipeline, I took inspiration from the implementation of common deep CNNs in TF-Slim (https://github.com/tensorflow/models/tree/master/slim). Basically, the pipeline is divided into three main components - directories:
In addition, I kept the two main scripts 'train_image_classifier.py' and 'eval_image_classifier.py' for training and evaluating the CNNs.
### Preprocess the data here.
### Feel free to use as many code cells as needed.
def random_transform(img, max_angle, scale_range):
"""Apply some random transform (rotation + scaling) to an image.
Sample uniformly the transformation parameters.
Used for data augmentation when converting to TFRecords.
"""
rows, cols, chs = img.shape
# Rotation and scaling. Note: reverse axis on OpenCV
angle = np.random.uniform(low=-max_angle, high=max_angle)
rot_matrix = cv2.getRotationMatrix2D((cols / 2, rows / 2), angle, 1.)
img = cv2.warpAffine(img, rot_matrix, (cols, rows),
flags=cv2.INTER_LINEAR,
borderMode=cv2.BORDER_REFLECT_101)
# Scaling matrix: keep image centered after scaled.
scale_x = np.random.uniform(low=scale_range[0], high=scale_range[1])
scale_y = np.random.uniform(low=scale_range[0], high=scale_range[1])
scale_matrix = np.array([[scale_x, 0., (1. - scale_x) * cols / 2.],
[0., scale_y, (1. - scale_y) * rows / 2.]],
dtype=np.float32)
img = cv2.warpAffine(img, scale_matrix, (cols, rows),
flags=cv2.INTER_LINEAR,
borderMode=cv2.BORDER_REFLECT_101)
return img
idx = 2000
plt.imshow(train['features'][idx])
plt.show()
nb_plots = 3
for i in range(nb_plots**2):
plt.subplot(nb_plots, nb_plots, i+1)
plt.imshow(random_transform(train['features'][idx], 20., [0.6, 1.4]))
plt.show()
def preprocess_for_train(image,
output_height,
output_width,
padding=4):
"""Real-time preprocessing of images in the TF-Slim pipeline:
- random cropping;
- random brightness and contrast modifications.
"""
tf.image_summary('image', tf.expand_dims(image, 0))
# Transform the image to floats.
image = tf.to_float(image)
if padding > 0:
image = tf.pad(image, [[padding, padding], [padding, padding], [0, 0]])
# Randomly crop a [height, width] section of the image.
distorted_image = tf.random_crop(image,
[output_height, output_width, 3])
# Randomly flip the image horizontally.
# Not a good idea for traffic sign!!!
# distorted_image = tf.image.random_flip_left_right(distorted_image)
# Random contrast and brightness.
distorted_image = tf.image.random_brightness(distorted_image,
max_delta=63)
distorted_image = tf.image.random_contrast(distorted_image,
lower=0.2, upper=1.8)
tf.image_summary('distorted_image', tf.expand_dims(distorted_image, 0))
# Subtract off the mean and divide by the variance of the pixels.
return tf.image.per_image_whitening(distorted_image)
Describe the techniques used to preprocess the data.
Answer:
I tested a few different techniques for preprocessing the data. To begin with, I add to my TF-slim preprocessing pipeline some simple random transforms:
Finally, every training or testing image is whitened, i.e. on every color channel, the mean is substracted and the variance normalised to one.
After training some simple models with this initial configuration, I discovered that my network had a tendancy to over-fit (accurracy equal to 1 on training dataset, but not getting better than 0.96 on the testing one). Since adding more L2-regularisation would not completely overcome this problem, I decided to generate additional data using more complex geometric transforms:
Note that I used OpenCV to perform this task, as TensorFlow does not handle yet direct affine or rotation transforms on images. In addition to these transforms, I also equilibrated the number of images per class in order to avoid an over-representaion of some classes during training, leading to some bias of the network at the end.
### Generate data additional (if you want to!)
### and split the data into training/validation/testing sets here.
### Feel free to use as many code cells as needed.
Describe how you set up the training, validation and testing data for your model. If you generated additional data, why?
Answer:
As previously presented, my pipeline follows the structure of TF-Slim. Hence, the data is converting to TFRecords files which can be handled directly by TensorFlow. Complex geometrical transforms such as rotation and scaling are added while generating the TFRecords files. Additional transforms on contrast and lightness are automatically generated while training the network.
import tensorflow as tf
slim = tf.contrib.slim
### Define your architecture here.
def cifarnet(images, num_classes=43, is_training=False,
dropout_keep_prob=0.5,
prediction_fn=slim.softmax,
scope='CifarNet'):
"""Creates a variant of the CifarNet model.
"""
end_points = {}
with tf.variable_scope(scope, 'CifarNet', [images, num_classes]):
net = slim.conv2d(images, 64, [5, 5], scope='conv1')
end_points['conv1'] = net
net = slim.max_pool2d(net, [2, 2], 2, scope='pool1')
end_points['pool1'] = net
net = tf.nn.lrn(net, 4, bias=1.0, alpha=0.001/9.0, beta=0.75, name='norm1')
net = slim.conv2d(net, 64, [5, 5], scope='conv2')
end_points['conv2'] = net
net = tf.nn.lrn(net, 4, bias=1.0, alpha=0.001/9.0, beta=0.75, name='norm2')
net = slim.max_pool2d(net, [2, 2], 2, scope='pool2')
end_points['pool2'] = net
net = slim.flatten(net)
end_points['Flatten'] = net
net = slim.fully_connected(net, 384, scope='fc3')
end_points['fc3'] = net
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
scope='dropout3')
net = slim.fully_connected(net, 192, scope='fc4')
end_points['fc4'] = net
logits = slim.fully_connected(net, num_classes,
biases_initializer=tf.zeros_initializer,
weights_initializer=trunc_normal(1/192.0),
weights_regularizer=None,
activation_fn=None,
scope='logits')
end_points['Logits'] = logits
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
return logits, end_points
def idsianet(images, num_classes=43, is_training=False,
dropout_keep_prob=0.5,
prediction_fn=slim.softmax,
scope='IdsiaNet'):
"""Creates a variant of the IDSIA model.
"""
end_points = {}
with tf.variable_scope(scope, 'IdsiaNet', [images, num_classes]):
net = slim.conv2d(images, 100, [7, 7], scope='conv1')
end_points['conv1'] = net
net = slim.max_pool2d(net, [2, 2], 2, scope='pool1')
end_points['pool1'] = net
net = slim.conv2d(net, 150, [4, 4], scope='conv2')
end_points['conv2'] = net
net = slim.max_pool2d(net, [2, 2], 2, scope='pool2')
end_points['pool2'] = net
net = slim.conv2d(net, 250, [4, 4], scope='conv3')
end_points['conv3'] = net
net = slim.max_pool2d(net, [2, 2], 2, scope='pool3')
end_points['pool3'] = net
net = slim.flatten(net)
end_points['Flatten'] = net
net = slim.fully_connected(net, 300, scope='fc1')
end_points['fc1'] = net
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
scope='dropout1')
logits = slim.fully_connected(net, num_classes,
biases_initializer=tf.zeros_initializer,
weights_initializer=trunc_normal(1/300.0),
weights_regularizer=None,
activation_fn=None,
scope='logits')
end_points['Logits'] = logits
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
return logits, end_points
def atrousnet_valid(images, num_classes=43, is_training=False,
dropout_keep_prob=0.5,
prediction_fn=slim.softmax,
scope='AtrousNet'):
"""Creates a model using Dilated-Atrous convolutions.
"""
end_points = {}
with tf.variable_scope(scope, 'AtrousNet', [images, num_classes]):
net = slim.conv2d(images, 64, [3, 3], padding='VALID',
weights_regularizer=None, scope='conv1')
end_points['conv1'] = net
net = slim.conv2d(net, 128, [3, 3], rate=2, padding='VALID',
weights_regularizer=None, scope='conv2')
end_points['conv2'] = net
net = slim.max_pool2d(net, [3, 3], 1, scope='pool2', padding='SAME')
net = slim.conv2d(net, 192, [3, 3], rate=3, padding='VALID',
weights_regularizer=None, scope='conv3')
end_points['conv3'] = net
# net = slim.max_pool2d(net, [3, 3], 1, scope='pool3', padding='SAME')
net = slim.conv2d(net, 256, [3, 3], rate=4, padding='VALID',
weights_regularizer=None, scope='conv4')
end_points['conv4'] = net
# net = slim.max_pool2d(net, [3, 3], 1, scope='pool4', padding='SAME')
net = slim.conv2d(net, 512, [1, 1], scope='conv5')
end_points['conv5'] = net
net = slim.dropout(net, dropout_keep_prob,
is_training=is_training,
scope='dropout1')
net = slim.conv2d(net, num_classes+1, [1, 1],
biases_initializer=tf.zeros_initializer,
weights_initializer=trunc_normal(1 / 512.0),
weights_regularizer=None,
activation_fn=None,
scope='conv6')
end_points['conv6'] = net
end_points['PredictionsFull'] = tf.nn.softmax(net)
# Global average pooling.
logits = tf.reduce_mean(net, [1, 2], name='pool7')
end_points['Logits'] = logits
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
return logits, end_points
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.
Answer:
In this project, I tried several Convolutional Neural Networks architectures. It was a way for me to get more expertise in training CNNs, and get more intuition of the advantages and inconvenients of different architectures of neural networks.
I started with a very standard solution, the historical LeNet CNN! With the following architecture:
| Layer type | Size | Stride | Output Width |
|---|---|---|---|
| Convolution | 5x5 | 1 | 64 |
| Max Pool | 2x2 | 2 | 64 |
| Convolution | 5x5 | 1 | 64 |
| Max Pool | 2x2 | 2 | 64 |
| Fully Connected | - | - | 384 |
| Dropout p=0.5 | - | - | 384 |
| Fully Connected | - | - | 192 |
| Fully Connected | - | - | 43 |
Even though the previous architecture offers good performances, I looked at networks optimised for the traffic sign dataset. Namely, taking inspiration from two main papers on the topic [1], [2] and [3], I used the following architecture directly inspired by the latter:
| Layer type | Size | Stride | Output Width |
|---|---|---|---|
| Convolution | 7x7 | 1 | 100 |
| Max Pool | 2x2 | 2 | 100 |
| Convolution | 4x4 | 1 | 150 |
| Max Pool | 2x2 | 2 | 150 |
| Convolution | 4x4 | 1 | 250 |
| Max Pool | 2x2 | 2 | 250 |
| Fully Connected | - | - | 300 |
| Dropout p=0.5 | - | - | 300 |
| Fully Connected | - | - | 43 |
Finally, I tried a third type of architecture, inspired by more recent work on convolutional neural networks. It is based on the work [1] which makes use of dilated convolutions on segmentation task. An important idea of this architecture, which is getting quite popular in recent litterature, is too avoid pooling with stride > 1 and remove the fully connected layers, replacing them with 1x1 convolutions. The main advantage to this solution is to directly generate a softmax probability map, hence providing probabilities of classification on every pixel of an image. Namely, in the case of this project, I came up with the following architecture:
| Layer type | Size | Stride | Rate | Output Width |
|---|---|---|---|---|
| Convolution | 3x3 | 1 | 1 | 64 |
| Convolution | 3x3 | 1 | 2 | 128 |
| Max Pool | 3x3 | 1 | 1 | 128 |
| Convolution | 3x3 | 1 | 3 | 192 |
| Convolution | 3x3 | 1 | 4 | 256 |
| Convolution | 1x1 | 1 | 1 | 512 |
| Dropout p=0.5 | - | - | - | 512 |
| Convolution | 1x1 | 1 | 1 | 43 |
### Train your model here.
### Feel free to use as many code cells as needed.
After training the previous architectures, here are the accuracy results on the testing dataset:
| Architecture | Test-accuracy |
|---|---|
| CifarNet | 96.3% |
| IdsiaNet | 98.06% |
| AtrousNet | 98.15% |
Evaluating the classifier can be done easily using the following command (with the correct dataset directory provided):
DATASET_DIR=../traffic-signs-data/GTSRB_size32
CHECKPOINT_FILE=checkpoints/atrousnet_valid.ckpt
python eval_image_classifier.py \
--alsologtostderr \
--checkpoint_path=${CHECKPOINT_FILE} \
--dataset_dir=${DATASET_DIR} \
--dataset_name=gtsrb_32 \
--dataset_split_name=test \
--model_name=atrousnet_valid
How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)
Answer:
In order to get a proper hyperparameters for training my models, I ran short training, stopping after 5k iterations. It is quick way to compare different hyperparameters (learning rate, batch size, optimiser parameters), and select the optimal ones.
At the end, I ended using the following sets of parameters (up to small variations):
What approach did you take in coming up with a solution to this problem?
Answer:
There is a quite large literature on traffic sign classification and neural networks. Better take inspiration from it! Hence, for inspiration, I looked mostly at important articles on this problem, and more broadly, on the topic of classification and segmentation using Convolutional Neural Networks.
Concerning the training, I used similarly standard learning rate and optimiser to obtain proper training. Comparing results on training and testing datasets, I come with the idea of augmenting the initial dataset, adding rotation and scaling transforms. Similarly, first trying with small transform parameters, I increased gradually the scale of latter to avoid over-fitting.
During this project, and more particularly, while training these different architectures, I ran into a few unexpected problems:
Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.
You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
### Load the images and plot them here.
### Feel free to use as many code cells as needed.
# Load CSV description...
import csv
with open('signnames.csv', 'r') as f:
creader = csv.reader(f)
csv_list = list(creader)
class_names = [a[1] for a in csv_list[1:]]
class_names.append('NOTHING! NOTHING! NOTHING!')
import tensorflow as tf
slim = tf.contrib.slim
from nets import idsianet
from nets import atrousnet
from preprocessing import gtsrb_32_preprocessing
ckpt_file_idsianet = './checkpoints/idsianet.ckpt'
ckpt_file_atrousnet = './checkpoints/atrousnet_valid.ckpt'
ckpt_file_atrousnet2 = './checkpoints/atrousnet_same.ckpt'
img_shape = (idsianet.idsianet.default_image_size, idsianet.idsianet.default_image_size, 3)
def idsianet_model(image, shape):
"""Return a preprocessing + model with image as input
"""
img_size = idsianet.idsianet.default_image_size
image = gtsrb_32_preprocessing.preprocess_for_eval(image, shape[0], shape[1])
image = tf.reshape(image, shape=(1, *shape))
with slim.arg_scope(idsianet.idsianet_arg_scope()):
logits, end_points = idsianet.idsianet(image)
return logits, end_points
def atrousnet_model(image, shape):
"""Return a preprocessing + model with image as input
"""
img_size = atrousnet.atrousnet_valid.default_image_size
image = gtsrb_32_preprocessing.preprocess_for_eval(image, shape[0], shape[1])
image = tf.reshape(image, shape=(1, *shape))
with slim.arg_scope(atrousnet.atrousnet_valid_arg_scope()):
logits, end_points = atrousnet.atrousnet_valid(image)
return logits, end_points
def atrousnet_model_same(image, shape):
"""Return a preprocessing + model with image as input
"""
img_size = atrousnet.atrousnet_same.default_image_size
image = gtsrb_32_preprocessing.preprocess_for_eval(image, shape[0], shape[1])
image = tf.reshape(image, shape=(1, *shape))
with slim.arg_scope(atrousnet.atrousnet_same_arg_scope()):
logits, end_points = atrousnet.atrousnet_same(image)
return logits, end_points
def eval_model(image, model, ckpt_file):
"""Evaluation of the model on one image.
Absolutely awfully optimised: reload everything at each call. But makes the job!
return: class predicted + probabilities of each class.
"""
with tf.Graph().as_default():
# Image placeholder and model
# image = image.astype(np.float32)
img_shape = image.shape
img_input = tf.placeholder(shape=img_shape, dtype=tf.uint8)
logits, end_points = model(img_input, img_shape)
with tf.Session() as session:
# Restore checkpoint
saver = tf.train.Saver()
saver.restore(session, ckpt_file)
# Run model.
output = session.run(end_points, feed_dict={img_input: image})
probabilities = output['Predictions'][0]
return np.argmax(probabilities), probabilities, output
Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It would be helpful to plot the images in the notebook.
Answer:
I chosed several random images on internet. As we can see below, several factors make the classification challenging:
### Run the predictions here.
### Feel free to use as many code cells as needed.
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
def weighted_img(img, initial_img, α=0.8, β=1., λ=0.):
return cv2.addWeighted(initial_img, α, img, β, λ)
# Test the predictions on the images directory
list_imgs = os.listdir("images/")
for img_path in list_imgs:
if img_path[-4:] == '.jpg':
image = mpimg.imread('images/' + img_path)
# Prediction?
idx, probabilities, output = eval_model(image, atrousnet_model, ckpt_file_atrousnet)
pmap = output['PredictionsFull'][0]
# Top 5 predictions.
top_k = np.argsort(probabilities)[:-6:-1]+1
# Mask image
pad = (image.shape[0]-pmap.shape[0]) // 2
mask_pad = np.pad(pmap[:,:,idx], pad, 'constant', constant_values=0.0) > 0.99
mask_img = np.zeros(image.shape, dtype=np.uint8)
mask_img[:,:,1] = mask_pad * 255
# Plot image with probability mask.
with sns.axes_style("white"):
plt.subplot(1, 2, 1)
plt.imshow(weighted_img(image, mask_img))
plt.title('Predictions: %s\n Index: %i\n Top-5: %s' % (class_names[idx], idx+1, top_k))
# Bar plot with softmax probabilities.
plt.subplot(1, 2, 2)
plt.bar(list(range(len(probabilities))), probabilities, 1 / 1.5)
plt.ylabel('Softmax Probabilities')
plt.show()
Is your model able to perform equally well on captured pictures or a live camera stream when compared to testing on the dataset?
Answer:
Despite trying the model on challenging images quite different from the training dataset, the latter seems to be quite robust, only mistaking one image (class 36 instead of 37), but still classifying it in the closest category.
Since the AtrousNet model gives as an ouput softmax probabilities on every pixel, I also displayed as a green mask the latter probabilities for the chosen class. It is interesting to see that the high probability zone is getting smaller when partial occlusion or rotation exist (see for instance the second no entry sign), but nevertheless, the classifier is able to recognise correctly the images thanks to a small zone with very high probability values.
The architecture of AtrousNet is clearly interesting on that aspect, as generating a full softmax probability map could then help to directly construct a segmentation and tracking pipeline on videos. For instance, we present below how the same exact neural network can recognise two different traffic signs on the same image.
### Visualize the softmax probabilities here.
### Feel free to use as many code cells as needed.
img_path = 'img12.jpg'
image = mpimg.imread('images/' + img_path)
# Prediction?
idx, probabilities, output = eval_model(image, atrousnet_model, ckpt_file_atrousnet)
pmap = output['PredictionsFull'][0]
# Top 5 predictions.
top_k = np.argsort(probabilities)[:-6:-1]
# Mask image
# mask_pad = np.pad(np.sum(pmap > 0.99999, axis=2), 10, 'constant', constant_values=0.0) > 0.0
pad = (image.shape[0]-pmap.shape[0]) // 2
mask_pad1 = np.pad(pmap[:,:,top_k[0]], pad, 'constant', constant_values=0.0) > 0.999
mask_pad2 = np.pad(pmap[:,:,top_k[1]], pad, 'constant', constant_values=0.0) > 0.999
mask_img = np.zeros(image.shape, dtype=np.uint8)
mask_img[:,:,1] = mask_pad1 * 255
mask_img[:,:,0] = mask_pad2 * 255
# Plot image with probability mask.
with sns.axes_style("white"):
plt.subplot(1, 2, 1)
plt.imshow(weighted_img(image, mask_img))
plt.title('Top-5: %s' % (top_k+1))
# Bar plot with softmax probabilities.
plt.subplot(1, 2, 2)
plt.bar(list(range(len(probabilities))), probabilities, 1 / 1.5)
plt.ylabel('Softmax Probabilities')
plt.show()
Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)
Answer:
On all the predictions, even when mistaken, the model is quite certain (p > 0.95). We may nevertheless note that in the one mistake case, the correct solution arrives second in rank, even though quite far in terms of probability.
If necessary, provide documentation for how an interface was built for your model to load and classify newly-acquired images.
Answer:
Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.
The accuracy of the classifiers could be probably be improved (state of the art is around 99.5%). Compared to the literature, an important factor is that we are working on 32x32 images instead of 48x48. Higher resolution would certainly help to increase performance. In addition, one could also explore the following strategies:
In addition, it could be fun to apply the pipeline to videos and check if it is able to directly recognise traffic signs from it.